-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support async/await session model #130
Conversation
The first option is to eliminate thread-local variables in FASTER entirely, and have logical sessions that may move across threads. This will allow FASTER to provide persistence guarantees for a logical session (e.g., commit/persist all operations with LSN < k on the logical session), regardless of which thread the session is currently executing on. This means users can seamlessly use other async operations in their code. Every FASTER operation would then need a session ID parameter, which is a bit ugly. Perhaps the default could still be “threads==sessions” behavior by using a session-free overload, where a thread-local variable is used to store the default session ID. Currently, FASTER operations that go async are affinitized back to the thread that issued the I/O. The client is supposed to call CompletePending to "resume" these operations on that thread. We could leave this as it is, with logical sessions expected to call CompletePending to complete the I/Os on that session. However, it may instead be interesting to support the async model for these operations, where the operation returns a Task instead. Internally, we could use Note that switching a session from one thread to another is generally expensive, as the session context needs to move to the other thread. However, ideally, this would happen rarely. In the common synchronous case, we would expect a thread to have a single session to FASTER, leading to high performance very similar to the current thread-local session model. |
How would you like to use |
One option to do away with threads could be for the lookup to pool sessions, renting them out and taking them back as needed. That could enable user syntax such as this: using (ISession session = lookup.OpenSession())
{
await session.UpsertAsync( /* stuff here */ )
} In the model above, the
This what the Orleans sample is doing now. The drawback is that we now need an extra allocation for every operation, even those completing on the sync path. At this time is hard to tell what performance difference it makes due to the overhead from everything else. An implementation using
I think the pooled session model above could work, by making the session pool return the session that is affinitized to the running thread (instead of just an id) and only create a new one if it does not exist yet. So it would be more of a I'll take a stab at the |
Indeed, this may have some merit, as operations in FASTER can complete sync or async. |
|
At least this can get the shell
You're right, that would open a can of worms, if it would even work at all. Best to split those concerns from the onset. As a first step, I'd just exploit the context parameter as the Orleans sample does it, and either make the context on the |
I think the pooled session model is appropriate. The sessions would essentially be async-local instead of thread-local. Alternatively, each grain could rent a session at activation and return it at deactivation. That would require something to pulse Thread-local has issues with a dynamically sized thread pool such as .NET's ThreadPool (which grains do not use currently) |
Okay, I'm just getting back to this topic after some travel. I plan to work on decoupling threads and sessions in Faster now, on this branch. The epoch table will contain one row per session. There will be no thread local variables any more. When someone closes a session, the epoch table entry will become available for another session to rent out. @ReubenBond, the session is a local variable for the application and thus effectively 'async local'. Is this what you meant, or were you thinking of AsyncLocal? The latter seems not very useful to me, perhaps I am missing something ... |
…ork: session-based acccess just updates existing thread-local variables to set the session for the current thread.
Rough sketch of session checked in, without touching the existing thread-local framework: session-based access will just update existing thread-local variables using the session's values, before proceeding with usual operation normally. |
@JorgeCandeias -- I added some preliminary support for pooled sessions. we need to see if this can improve the Orleans integration. See samples at https://github.com/microsoft/FASTER/blob/async-support/cs/test/SessionFASTERTests.cs
I'm still a bit fuzzy on how exactly this would map to the Orleans threading model, and would appreciate your thoughts. E.g., since individual grains may perform other async ops at any time, they may be holding an active FASTER session when they go async, right? In that case, when a different grain is scheduled on that thread, it would need to "rent" a different session. Thus, the number of active sessions may grow to more than the number of threads -- it would depend on the number of pending activated grains. |
@JorgeCandeias -- any comments? Would this version be of interest to you to improve the FASTER-Orleans sample? And what do you think of adding a true async interface to FASTER? |
Hi @badrishc, many apologies for the delay, other things moved up on my list in the meantime, but this one is still on it. Yes, this version is already useful for improving the sample, so I'll take it on-board, thank you! That said, being a synchronous interface, it will still have to run on the thread pool in order not to limit throughput on the Orleans task scheduler when spinning for I/O completions. A true async interface with no blocking or background threads will enable direct integration with the Orleans scheduler and therefore not having to rely on the thread pool. More importantly, it will become easier to promote this to product developers looking for straightforward solutions. 😄
This is correct. It has less to do with grains and more with TPL tasks. Although the Orleans scheduler only creates as many worker threads as processors, there can be great many tasks pending in the system, due to awaiting something non-CPU bound to complete, like I/O operations. In the meantime other tasks will run. For background on this, Orleans does this to keep the CPU doing actual user work as much as possible and avoid wasting cycles on context switching or spinning for I/O. The reason is that grain requests on Orleans tend to be very fine-grained on the CPU, e.g. in the microseconds and lower (even if they await I/O tasks for longer), so having an Orleans worker thread spinning even for a few milliseconds can lower the throughput ceiling. Orleans is opinionated on this and enforces responsiveness in the system by killing any task taking over 30s and issuing warnings when tasks take over 200ms, regardless of CPU usage. In my head, for an async/await model to work with the Orleans scheduler, I think...
I'm leaning in favour of isolated/pooled sessions because they mimic the behavior of well-known expensive resources such as I attempted to apply the Maybe you two have a better trick for this that I can learn from?
This matches the intent of the sample and naturally follows into creating generic grain templates for any type, or with an eye in the future, providing an injected faster lookup for some type and underlying storage via DI bindings, azure function style. |
One minor point: we wont kill Tasks, since .NET Tasks are co-operative & aren't preemptible. We will throw a timeout exception back to the caller, though, so that they aren't left hanging forever. We will also destroy the grain activation if it has become unresponsive for a very long time.
Just to be precise here: a grain method call can hop between threads between
Yes, precisely correct. This goes for any async method in .NET (i.e, it's not Orleans-specific) |
Thanks for this important clarification 👍
What I meant was it assumes developers won't willingly pass sessions around in a way that allows concurrent access. |
How about this: we slightly modify FASTER's API to support "ref Context" instead of "Context". User then passes in a "null" context to the lookup. If/when an operation goes async, it can fill up/augment this context via a new user-specified context provider callback. In our use case, this callback will allocate the void UpdatePendingContext(ref context);
{
context = new TaskCompletionSource<TOutput>(...);
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some queries that need to be resolved. See comments. O/w looks good to me.
We now have a full async-compliant sessions interface to FasterKv! You no longer need to call refresh or worry about thread affinity. The overall idea is to create new sessions to FasterKv and perform a sequence of operations on a session. You can also have a separate async committer (checkpointer) thread for durability. The async session operations can be configured to await until either after they are completed in memory or after they are made durable by a checkpoint. This is similar to the new FasterLog API. See any example in the playground for details (older API is marked obsolete for now). A good one is the newly added sample here. I think it almost cannot get easier than this for C# users, but would love to hear feedback from everyone. |
Async Sessions API// Create new shared FASTER KV
var faster = new FasterKV(...);
// Create any number of sessions to FasterKV
var session = faster.NewSession();
// Async read, retrieves from disk if needed, reads uncommited if present
(Status, Output) result = await session.ReadAsync(key);
// Read waits for uncommitted read value to checkpoint/commit before returning
(Status, Output) result = await session.ReadAsync(key, waitForCommit: true);
// Upsert into mutable region, does not wait for commit
await session.UpsertAsync(key, value);
// Upsert into mutable region, return after this (and all prev) upserts commit
await session.UpsertAsync(key, value, waitForCommit: true);
// RMW into mutable region, does not wait for commit
await session.RMWAsync(key, input);
// RMW into mutable region, wait for commit
await session.RMWAsync(key, input, waitForCommit: true);
// Waits for all prev operations on session to commit before returning
await session.WaitForCommit(); Commits can be performed periodically by a separate thread/task: int i = 0;
while (true)
{
Thread.Sleep(5000);
if (i++ % 100 == 0)
faster.TakeFullCheckpoint(out _); // periodically take full index + log checkpoint
else
faster.TakeHybridLogCheckpoint(out _); // take (incremental) log checkpoint
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than minor nits and clarifications. I didn't look at the internals too closely since I am not as familiar.
The FASTER model is based on a set of threads, each of which can start a session with FASTER, perform a sequence of operations, and later stop the session. Operations consist of bursts of activity, with periodic invocations of a Refresh() call to FASTER in between. It would not be okay for a thread/session to go to sleep (i.e., stop calling Refresh) without stopping its session. Sessions and threads are tightly integrated in this model.
However, this tight coupling of threads and sessions in FASTER is not ideal for an async/await API using the C# thread pool. This PR aims to investigate and prototype alternatives to this approach, and implement the best one. There is next to no code right now, this PR for now serves as a place to discuss options and alternatives.
Adding @JorgeCandeias @gunaprsd @ReubenBond for comments.
Fix #140